AITopics | non-convex sgd

Collaborating Authors

non-convex sgd

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Momentum-Based Variance Reduction in Non-Convex SGD

Neural Information Processing SystemsDec-25-2025, 22:02:25 GMT

Variance reduction has emerged in recent years as a strong competitor to stochastic gradient descent in non-convex problems, providing the first algorithms to improve upon the converge rate of stochastic gradient descent for finding first-order critical points. However, variance reduction techniques typically require carefully tuned learning rates and willingness to use excessively large mega-batches in order to achieve their improved results. We present a new algorithm, STORM, that does not require any batches and makes use of adaptive learning rates, enabling simpler implementation and less hyperparameter tuning. Our technique for removing the batches uses a variant of momentum to achieve variance reduction in non-convex optimization. On smooth losses $F$, STORM finds a point $x$ with $\mathbb{E}[\|\nabla F(x)\|]\le O(1/\sqrt{T}+\sigma^{1/3}/T^{1/3})$ in $T$ iterations with $\sigma^2$ variance in the gradients, matching the best-known rate but without requiring knowledge of $\sigma$.

momentum-based variance reduction, name change, non-convex sgd, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Reviews: Momentum-Based Variance Reduction in Non-Convex SGD

Neural Information Processing SystemsJan-26-2025, 15:54:52 GMT

I agree with R3 that you did a poor job on relating your work to existing methods, in particular SARAH. Please also make sure that you carefully address the question of optimality. I also realized that your method in fact has nothing to do with momentum. Consider for instance deterministic objective, f(x, \xi) f(x). If one has a tight estimate, i.e. d_{t-1} abla f(x_{t-1}), then from your update rules it follows that d_t abla f(x_t), i.e. the method become gradient descent with no momentum! Your title, thus, is very confusing and I highly encourage you to change it.

experiment, momentum-based variance reduction, non-convex sgd, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.39)

Add feedback

Momentum-Based Variance Reduction in Non-Convex SGD

Neural Information Processing SystemsOct-10-2024, 19:29:17 GMT

Variance reduction has emerged in recent years as a strong competitor to stochastic gradient descent in non-convex problems, providing the first algorithms to improve upon the converge rate of stochastic gradient descent for finding first-order critical points. However, variance reduction techniques typically require carefully tuned learning rates and willingness to use excessively large "mega-batches" in order to achieve their improved results. We present a new algorithm, STORM, that does not require any batches and makes use of adaptive learning rates, enabling simpler implementation and less hyperparameter tuning. Our technique for removing the batches uses a variant of momentum to achieve variance reduction in non-convex optimization. On smooth losses F, STORM finds a point x with \mathbb{E}[\ abla F(x)\ ]\le O(1/\sqrt{T} \sigma {1/3}/T {1/3}) in T iterations with \sigma 2 variance in the gradients, matching the best-known rate but without requiring knowledge of \sigma .

momentum-based variance reduction, non-convex sgd, stochastic gradient descent, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Momentum-Based Variance Reduction in Non-Convex SGD

Cutkosky, Ashok, Orabona, Francesco

Neural Information Processing SystemsMar-19-2020, 03:02:01 GMT

Variance reduction has emerged in recent years as a strong competitor to stochastic gradient descent in non-convex problems, providing the first algorithms to improve upon the converge rate of stochastic gradient descent for finding first-order critical points. However, variance reduction techniques typically require carefully tuned learning rates and willingness to use excessively large "mega-batches" in order to achieve their improved results. We present a new algorithm, STORM, that does not require any batches and makes use of adaptive learning rates, enabling simpler implementation and less hyperparameter tuning. Our technique for removing the batches uses a variant of momentum to achieve variance reduction in non-convex optimization. Papers published at the Neural Information Processing Systems Conference.

momentum-based variance reduction, non-convex sgd, stochastic gradient descent, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback